15 research outputs found

    Multiple description video coding for stereoscopic 3D

    Get PDF
    In this paper, we propose an MDC schemes for stereoscopic 3D video. In the literature, MDC has previously been applied in 2D video but not so much in 3D video. The proposed algorithm enhances the error resilience of the 3D video using the combination of even and odd frame based MDC while retaining good temporal prediction efficiency for video over error-prone networks. Improvements are made to the original even and odd frame MDC scheme by adding a controllable amount of side information to improve frame interpolation at the decoder. The side information is also sent according to the video sequence motion for further improvement. The performance of the proposed algorithms is evaluated in error free and error prone environments especially for wireless channels. Simulation results show improved performance using the proposed MDC at high error rates compared to the single description coding (SDC) and the original even and odd frame MDC

    RUSHES—an annotation and retrieval engine for multimedia semantic units

    Get PDF
    Multimedia analysis and reuse of raw un-edited audio visual content known as rushes is gaining acceptance by a large number of research labs and companies. A set of research projects are considering multimedia indexing, annotation, search and retrieval in the context of European funded research, but only the FP6 project RUSHES is focusing on automatic semantic annotation, indexing and retrieval of raw and un-edited audio-visual content. Even professional content creators and providers as well as home-users are dealing with this type of content and therefore novel technologies for semantic search and retrieval are required. In this paper, we present a summary of the most relevant achievements of the RUSHES project, focusing on specific approaches for automatic annotation as well as the main features of the final RUSHES search engine

    Compressed video communications

    No full text

    Robust human face tracking in eigenspace for perceptual human-robot interaction

    No full text
    This chapter introduces a robust human face tracking scheme for vision-based human-robot interaction, where the detected face-like regions in the video sequence are tracked using unscented Kalman filter (UKF), and face occlusion are tackled by using an online appearance-based scheme using principle component analysis (PCA). The experiment is carried out with the standard test video, which validates that the proposed PCA-based face tracking can attain robust performance in tackling face occlusions

    Nonrigid Structure-From-Motion From 2-D Images Using Markov Chain Monte Carlo

    No full text
    In this paper we present a new method for simultaneously determining 3-D shape and motion of a nonrigid object from uncalibrated 2-D images without assuming the distribution characteristics. A nonrigid motion can be treated as a combination of a rigid rotation and a nonrigid deformation. To seek accurate recovery of deformable structures, we estimate the probability distribution function of the corresponding features through random sampling, incorporating an established probabilistic model. The fitting between the observation and the projection of the estimated 3-D structure will be evaluated using a Markov chain Monte Carlo based expectation maximization algorithm. Applications of the proposed method to both synthetic and real image sequences are demonstrated with promising results

    Multimodal Biometric Human Recognition for Perceptual Human-Computer Interaction

    No full text
    In this paper, a novel video-based multimodal biometric verification scheme using the subspace-based low-level feature fusion of face and speech is developed for specific speaker recognition for perceptual human-computer interaction (HCI). In the proposed scheme, human face is tracked and face pose is estimated to weight the detected facelike regions in successive frames, where ill-posed faces and false-positive detections are assigned with lower credit to enhance the accuracy. In the audio modality, mel-frequency cepstral coefficients are extracted for voice-based biometric verification. In the fusion step, features from both modalities are projected into nonlinear Laplacian Eigenmap subspace for multimodal speaker recognition and combined at low level. The proposed approach is tested on the video database of ten human subjects, and the results show that the proposed scheme can attain better accuracy in comparison with the conventional multimodal fusion using latent semantic analysis as well as the single-modality verifications. The experiment on MATLAB shows the potential of the proposed scheme to attain the real-time performance for perceptual HCI applications

    Advances in Video Summarization and Skimming

    No full text
    This chapter summarizes recent advances in video abstraction for fast content browsing, skimming, transmission, and retrieval of massive video database which are demanded in many system applications, such as web multimedia, mobile multimedia, interactive TV, and emerging 3D TV. Video summarization and skimming aims to provide an abstract of a long video for shortening the navigation and browsing the original video. The challenge of video summarization is to effectively extract certain content of the video while preserving essential messages of the original video. In this chapter, the preliminary on video temporal structure analysis is introduced, various video summarization schemes, such as using low-level features, motion descriptors and Eigen-features, are described, and case studies on two practical summarization schemes are presented with experimental results

    An error resilience method for depth in stereoscopic 3D video

    No full text
    Error resilience stereoscopic 3D video can ensure robust 3D video communication especially in high error rate wireless channel. In this paper, an error resilience method is proposed for the depth data of the stereoscopic 3D video using data partitioning. Although data partitioning method is available for 2D video, its extension to depth information has not been investigated in the context of stereoscopic 3D video. Simulation results show that the depth data is less sensitive to error and should be partitioned towards the end of the data partitions block. The partitioned depth data is then applied to an error resilience method namely multiple description coding (MDC) to code the 2D video and the depth information. Simulation results show improved performance using the proposed depth partitioning on MDC compared to the original MDC in an error prone environment
    corecore